From Takeoff to Touchdown: Dissecting Data on Air Disasters

INFO 526 - Project Final

A shiny app integration with aircraft crash analysis
Author
Affiliation

Infographic Innovators - Antonio, Bharath, Eshaan, Thanoosha

School of Information, University of Arizona

Abstract:

This study delves into a comprehensive analysis of aircraft crashes in the United States spanning from 1980 to 2022. It focuses on exploring crash locations, timings, consequences, and the influencing factors behind these incidents. Leveraging a detailed dataset sourced from the National Transportation Safety Board (NTSB), the research utilizes data visualization techniques and time-series analyses to uncover correlations and trends associated with these aviation mishaps.

Introduction:

The objective of this study is to meticulously examine aircraft crashes’ locations, timings, and consequences during the specified time frame. The research aims to discern correlations, if any, contributing to these incidents and to ascertain whether certain regions are more prone to a higher number of crashes.

The research methodology involves a thorough analysis of the NTSB dataset, employing various data visualization tools and statistical analyses. Specifically, a choropleth is generated to visualize crash frequencies across different regions, while a radial bar plot illustrates crashes during specific flight phases. Additionally, the study investigates crash causes and their correlation with the severity of outcomes through bar plots and stacked area charts. A radar plot is utilized to explore crash occurrences concerning weather conditions and months.

Examining Aircraft Crashes, with a focus on their locations, timings, and consequences

Timeseries analysis of fatalities, and types of injuries

Approach

We aim to analyze the historical data on fatalities and injuries to understand the trends over the years. To initiate our investigation, we have created two animations. The first animation illustrates the cumulative count of both fatalities and injuries over time, employing the geom_line() function and incorporating a flight image to signify the movement of data points within the plot. In the second animation, we have categorized fatalities based on the severity of injuries, providing a visual representation of how different types of fatalities have evolved over time.

Findings

There has been a general decrease in the number of total fatalities from 1980 to 2022. A notable spike in fatalities was observed in 2001, attributed to the 9/11 attacks. Post-2001, a significant decline in fatalities was noted.

Choropleth map on number of crashes in different regions(US map)

Approach

The approach for creating a choropleth map of flight crashes in different U.S. regions involves data preparation, map generation, and animation. The dataset is filtered for valid latitude and longitude values, and relevant columns are selected. To ensure completeness, unique states and all years are identified. A function, is defined to create maps for each input year, customizing color scales. Maps are saved for each year using a loop.

The animation is created by loading saved map images, joining them into a sequence, and generating the animation. The resulting animation visually represents the temporal evolution of flight crashes across U.S. states.

Findings

Looking at the choropleth map Alaska, Arizona, Texas, and Florida has the highest number of crashes.

Radial Bar Plot

Approach

We plotted the radial bar plot to explore the distribution of flight crashes and associated injuries across different phases of flight. By categorizing flight phases into Landing, Takeoff, Approach, Maneuvering, Climb, and Other this visualization aims to uncover insights into the critical moments during a flight where incidents are more likely to occur. The first graph highlights the count of crashes in each phase, while the second graph focuses on the count of injuries, providing a broad perspective on the safety challenges associated with each phase.

Findings

The radial bar plot offers an intuitive and visually appealing representation of the distribution of crashes and injuries throughout various phases of flight. In the first graph, the bars radiating from the center depict the count of crashes in each phase, allowing for a quick comparison of their frequencies. This visualization enables the identification of phases that might be particularly prone to incidents, guiding further investigation into the contributing factors.

The second graph, depicting injuries, provides an additional layer of analysis. By comparing the counts of injuries across different flight phases, we can discern whether certain phases are more likely to result in severe consequences. This insight is crucial for understanding the potential risks associated with specific segments of a flight, informing safety measures and protocols.

Analysis of Causes of Crashes

Waffle chart

Approach

We prepare data by calculating percentages and determining tile numbers for causes of airplane crashes. It generates a waffle chart using ggplot2, displaying causes represented by ‘✈’ symbols in a grid layout, each tile proportional to the cause’s percentage, and then converts it into an interactive plot using plotly for visualization.

Findings

The waffle chart effectively showcased the distribution of crash causes, emphasizing the prominence of human error(pilot failures) in aviation incidents. The data highlighted the need for enhanced safety measures and training to address the identified causes of crashes.

Density Plot

Approach

We plotted the density plot using ggplot’s geom_density function is to visually analyze the distribution of flight crashes over the years based on their probable causes. By utilizing the probable_cause_flights dataset and focusing on the cause_summary column, this visualization aims to provide insights into the changing patterns and trends of aviation incidents. The x-axis represents the years, offering a chronological perspective, while the y-axis portrays the density of crashes associated with specific causes. Here, we have focused on the attribute Pilot's Failure

Findings

This visual representation allows us to identify clusters of high density, indicating periods or years where certain causes were more prevalent. Additionally, it facilitates the detection of outliers or shifts in patterns, enabling a more nuanced exploration of the dataset. Here, we can see that injuries in particular have reduced overtime with the number of Fatal Injuries reducing significantly over the past few decades.

Assessing the Influence of Weather Conditions on Crashes

Radar Plot

Approach

We plot the Radar Plot using Plotly for R to comprehensively assess the influence of weather conditions on flight crashes. Leveraging the flights_ntsb_radar dataset which we derived from the original flights_ntsb dataset and categorizing flight crashes based on Visual Meteorological Conditions (VMC) and Instrument Meteorological Conditions (IMC). This visualization seeks to highlight the varying degrees of impact these conditions have on aviation safety. By layering both datasets on a radar plot, we aim to provide a holistic perspective on how different weather scenarios contribute to flight incidents.

Findings

The radar plot serves as an effective means to showcase the multivariate nature of weather conditions and their relationship with flight crashes. Each axis on the radar represents a specific parameter related to aviation safety, such as visibility, cloud cover, wind speed, and temperature. The radar plot allows for the simultaneous comparison of these parameters for VMC and IMC, unveiling patterns and discrepancies in their respective contributions to incidents.

Conclusion

We can conclude stating that a lot of crashes take place every year but the number of crashes has been decreasing over the past few decades. The number of fatalities has also gone down due to the stringent rules in the Aviation industry. With the fatalities and crashes decreasing over time and more we are moving towards a safer and faster mode of transport which can get us around the globe in a span of a couple hours.

Overall trends indicate a decrease in total fatalities from crashes over the 1980-2020 period, with a notable spike in 2001 attributed to the events of 9/11. Specific regions like Alaska, Arizona, Texas, and Florida exhibited higher crash frequencies. Pilot error emerged as the predominant cause, followed by mechanical failures. Severity of crashes varied based on different causes, as depicted in the stacked area chart. Furthermore, distinct weather patterns correlated with increased crash occurrences in specific months, as revealed by the radar plot.

References

gganimate - gganimate

plotly - plotly

issues - legend